Efficient Reduction for Wait-Free Termination Detection in a Crash-Prone Distributed System
نویسندگان
چکیده
We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm A that has been designed for a failure-free environment into a termination detection algorithm B that can tolerate process crashes. Our transformation assumes the existence of a perfect failure detector. We show that a perfect failure detector is in fact necessary to solve the termination detection problem in a crash-prone distributed system even if at most one process can crash. Let μ(n,M) and δ(n,M) denote the message complexity and detection latency, respectively, of A when the system has n processes and the underlying computation exchanges M application messages. The message complexity of B is at most O(n + μ(n, 0)) messages per failure more than the message complexity of A. Also, its detection latency is at most O(δ(n, 0)) per failure more than that of A. Furthermore, the overhead (that is, the amount of control data piggybacked) on an application message increases by only O(log n) bits per failure. The fault-tolerant termination detection algorithm resulting from the transformation satisfies two desirable properties. First, it can tolerate failure of up to n−1 processes, that is, it is wait-free. Second, it does not impose any overhead on the fault-sensitive termination detection algorithm until one or more processes crash, that is, it is fault-reactive. Our transformation can be extended to arbitrary communication topologies provided process crashes do not partition the system.
منابع مشابه
Efficient Reductions for Wait-Free Termination Detection in Faulty Distributed Systems
We investigate the problem of detecting termination of a distributed computation in asynchronous systems where processes can fail by crashing. More specifically, for both fully and arbitrarily connected communication topologies, we describe efficient ways to transform any fault-sensitive termination detection algorithm A, that has been designed for a failure-free environment , into a wait-free ...
متن کاملOn termination detection in crash-prone distributed systems with failure detectors
We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm A that has been designed for a failure-free environment into a termination detection algorithm B that can tolerate process crashes. Ou...
متن کاملDigital Fountains and Their Application to Informed Content Delivery over Adaptive Overlay Networks
Securing the net : challenges, failures and directions p. 2 Coeterie availability in sites p. 3 Keeping denial-of-service attackers in the dark p. 18 On conspiracies and hyperfairness in distributed computing p. 33 On the availability of non-strict quorum systems p. 48 Musical benches p. 63 Obstruction-free algorithms can be practically wait-free p. 78 Efficient reduction for wait-free terminat...
متن کاملSurvey of Distributed Decision
We survey the recent distributed computing literature on checking whether a given distributed system configuration satisfies a given boolean predicate, i.e., whether the configuration is legal or illegal w.r.t. that predicate. We consider classical distributed computing environments, including mostly synchronous fault-free network computing (LOCAL and CONGEST models), but also asynchronous cras...
متن کاملTermination Detection in Systems Where Processes May Crash and Recover —
An algorithm solving the termination detection problem observes a computation of a distributed system and announces “termination” if the computation has come to an end. This work addresses termination detection in systems where processes fail by crashing and may restart later on. The new definition of robust-restricted termination sensible in the crash-recovery model is developed. A computation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005